Homology Search with Fragmented Nucleic Acid Sequence Patterns

نویسندگان

  • Axel Mosig
  • Julian J.-L. Chen
  • Peter F. Stadler
چکیده

The comprehensive annotation of non-coding RNAs in newly sequenced genomes is still a largely unsolved problem because many functional RNAs exhibit not only poorly conserved sequences but also large variability in structure. In many cases, such as Y RNAs, vault RNAs, or telomerase RNAs, sequences differ by large insertions or deletions and have only a few small sequence patterns in common. Here we present fragrep2, a purely sequence-based approach to detect such patterns in complete genomes. A fragrep2 pattern consists of an ordered list of position-specific weight matrices (PWMs) describing short, approximately conserved sequence elements, that are separated by intervals of non-conserved regions of bounded length. The program uses a fractional programming approach to align the PWMs to genomic DNA in order to allow for a bounded number of insertions and deletions in the patterns. These patterns are then combined to significant combinations of PWMs. At this step, a subset of PWMs may be deleted, i.e., have no match in the current region of the genome. The program furthermore estimates pand E-values for the matches. We apply fragrep2 to homology searches for RNase MRP, unveiling two previously unidentified matches as well as reproducing the results of two previous surveys. Furthermore, we complement the picture of vertebrate vault RNAs, a class of ncRNAs that has not received much attention so far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GRASP: Guided Reference-based Assembly of Short Peptides

Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of...

متن کامل

In Silico Analysis of Glutaminase from Different Species of Escherichia and Bacillus

Background: Glutaminase (EC 3.5.1.2) catalyzes the hydrolytic degradation of L-glutamine to L-glutamic acid and has been introduced for cancer therapy in recent years. The present study was an in silico analysis of glutaminase to further elucidate its structure and physicochemical properties.Methods: Forty glutaminase protein sequences from different species of Escherichia and Bacillus obtained...

متن کامل

The SBASE protein domain library, release 2.0: a collection of annotated protein sequence segments

SBASE 2.0 is the second release of SBASE, a collection of annotated protein domain sequences. SBASE entries represent various structural, functional, ligand-binding and topogenic segments of proteins [Pongor, S. et al. (1993) Prot. Eng., in press]. This release contains 34,518 entries provided with standardized names and it is cross-referenced to the major protein and nucleic acid databanks as ...

متن کامل

Evaluation of Nucleic Acid Sequence Based Amplification (NASBA) and Reverse Transcription Polymerase Chain Reaction for Detection of Coxsackievirus B3 in Cell Culture and Animal Tissue Samples

Enteroviruses are the causative agents of a number of diseases in humans. Group B coxsackieviruses are believed to be the most common viral agents responsible for human heart disease. Genomic data of enteroviruses has allowed developing new molecular approaches such as Nucleic Acid Sequence Based Amplification (NASBA) for detection of such viruses. In this study, coxsackievirus B3 (CVB3) was de...

متن کامل

Homology between the invertible deoxyribonucleic acid sequence that controls flagellar-phase variation in Salmonella sp. and deoxyribonucleic acid sequences in other organisms.

The invertible deoxyribonucleic acid (DNA) segment cloned from Salmonella sp. was radioactively labeled and used as a probe to search for homologous sequences by Southern hybridization. Only one copy of the invertible segment could be found on the Salmonella sp. genome. Partial sequence homology with the invertible region was detected in bacteriophage Mu and P1 DNA by low-stringency hybridizati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007